Applications

13

TABLE 1.2

Experimental results of some famous binary methods on ImageNet.

Methods

Weights Activations

Model

Binarized Acc. Full-precision Acc.

Top-1 Top-5 Top-1

Top-5

XNOR-Net [199]

Binary

Binary

ResNet-18 51.2

73.2

69.3

89.2

ABC-Net [147]

Binary

Binary

ResNet-50 70.1

89.7

76.1

92.8

LBCNN [109]

Binary

62.431

64.94

Bi-Real Net [159]

Binary

Binary

ResNet-34 62.2

83.9

73.3

91.3

PCNN [77]

Binary

Binary

ResNet-18 57.3

80.0

69.3

89.2

RBCN [148]

Binary

Binary

ResNet-18 59.5

81.6

69.3

89.2

BinaryDenseNet [12]

62.5

83.9

BNAS [36]

71.3

90.3

1.2.1

Image Classification

Image classification aims to group images into different semantic classes together. Many

works regard the completion of image classification as the criterion for the success of

BNNs. Five datasets are commonly used for image classification tasks: MNIST [181], SVHN,

CIFAR-10 [122], CIFAR-100 and ImageNet [204]. Among them, ImageNet is the most diffi-

cult to train and consists of 100 classes of images. Table 1.2 shows the experimental results

of some of the most popular binary methods on ImageNet.

1.2.2

Speech Recognition

Speech recognition is a technique or capability that enables a program or system to process

human speech. We can use binary methods to complete speech recognition tasks in edge

computing devices.

Xiang et al. [252] applied binary DNNs to speech recognition tasks. Experiments on

TIMIT phone recognition and 50-hour Switchboard speech recognition show that binary

DNNs can run about four times faster than standard DNNs during inference, with roughly

10.0%.

Zheng et al. [290] and Yin et al. [273] also implement binarized CNN-based speech

recognition tasks.

1.2.3

Object Detection and Tracking

Object detection is the process of finding a target from a scene, while object tracking is the

follow-up of a target in consecutive frames in a video.

Sun et al. [218] propose a fast object detection algorithm based on BNNs. Compared to

full-precision convolution, this new method results in 62 times faster convolutional opera-

tions and 32 times memory saving in theory.

113×13 Filter